nk li1
d37eb50d868361ea729bb4147eb3c1d8-Supplemental.pdf
When|Nk| = 1,one can easily validate that this condition holds. We use mini-batch gradient descent with batch size of 10. We tune the step-sizes and forgetting factors from the interval(0,1) and find the best empirical performance by setting them to be µk = 0.01andνk = 0.05for every normal agentk. Byzantine agents are designed tosend amodel with very small noisy elements for each dimension from the interval [0,0.1] at each iteration. Figure5andFigure6ashow the mean and range of the averagetrainingloss and classification accuracy of the normal agents in the case of no attack, with 10 random selected Byzantine agents, and with 29 Byzantine agents.